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Chapter 1 


INTRODUCTION 

In recent years, there has been a trend toward statistically based contract 
specifications in an effort to continually improve product quality, and provide additional 
value for the cost. The AASHO Road Test of 1958-1961 produced a sufficient number of 
unbiased test results of construction materials and the techniques used to install them to 
show for the first time their variability and relationship to the specifications. The result of 
these КК clearly demonstrated that the significance of certain items In the 
specification simply was not known, nor were the real standard or level of quality the 
specifications were supposed to guarantee [TRB, 1976, p 3]. This was the period in 
which the concept of performance based, or end-result specifications, was born and that a 
contract written with minimum standards would likely result in the same. The Blatnik 
Committee’s discovery in 1962 that there was not 100 percent compliance with 
specifications almost lead to Congress passing a law making it a federal offense to 
knowingly incorporate nonspecification material in a highway project [TRB, 1976, p 3]. 
These were the genesis of today’s developing sampling plans that estimate the true 
characteristics of materials and construction methods for which the specifications are 
written [TRB, 1976, p 3]. The only drawback to statistical sampling is that without a 
basic understanding of its characteristics and nuances, it can lead to undesirable 
consequences that may not be readily apparent to those designing and implementing the 


plan. 
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The research for this study put considerable emphasis on comparing Washington 


State’s Department of Transportation Standard Specification with Military Standard 414. 
At the end of the project, it was discovered, unexpectedly, that the lack of association 
between the WSDOT specification and MIL-STD-414 was by design. In other words, the 
original intention of the specification writers was not to mirror exactly the sampling 
methods in MIL-STD-414 even though it appeared at first that it was. The primary reason 
for this was the recognition by the plan designers that the sample sizes that would likely be 
required using MIL-STD-414 simply were not economically feasible, therefore 
necessitating using small sample testing methodology. As will be demonstrated through 
these writings, the benefits of following MIL-STD-414 to the letter are lost, but the 


economic pay-back of smaller sample sizes compensate for that loss. 


Statistical Sampling 

This report was written to provide insight into using statistical sampling methods, 
their advantages and disadvantages, the pitfalls of equating expected pay to risk, as well as 
provide contractors an explanation of their responsibilities and the advantages to both 
contracting parties in a properly designed acceptance plan. 

Statistical sampling plans are a tool by which a reasonable estimate of product 
quality can be made by measuring the characteristics of a randomly selected sample. 
Different sampling plans require different sample sizes for comparable levels of confidence 
in the results. It is here, the preliminary design stage, that a decision must be made to 
determine if it will be more expensive to make easy and quick measurements of a larger 


sample, or more meticulous measurements of a smaller sample. The results of the sample 
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measurements allow the inspector to make a decision, or a judgment sometimes called 
“sentencing” [Montgomery, 1991, p 551], about the body of material from which the 
sample was taken. Acceptance sampling is just what it says. It is not to be used to control 
the contractor’s process capabilities. It is simply a means to decide if an owner should 
accept what the contractor 1s providing. The contractor can just as easily employ a 
Statistical sampling technique to control the quality of the product before subjecting it to 


an owner’s plan. These concepts will be discussed in more detail later in the report. 


WA-RD 326.1 

In 1989, the Washington State Department of Transportation (WSDOT) elected to 
implement quality assurance specifications on several asphalt paving projects. This was a 
test case in an effort to remove bias from inspection, and ensure a predictable level of 
quality. Positive feedback from both the contractors and state employees encouraged 
WSDOT to continue and broaden the use of statistically based specifications. The intent 
of WA-RD 326.1, “An Initial Evaluation of the WSDOT Quality Assurance Specifications 
for Asphalt Concrete” was to determine quantitatively any real changes in pavement 
quality as a result of the new specifications. The new specifications did indeed produce a 


modest improvement in quality based on six projects, three QA and three non-QA 


[Markey et al, 1994, p 1]. 


Quantifying Risk 
One aspect of this report’s research was to take WA-RD 326.1 one step further in 


an attempt to specifically quantify the statistical risks to both WSDOT, and the 
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contractors who operated under the new specification. In addition, the appropriateness of 


the pay factors for varying levels of product quality and sample sizes was also examined. 


Insight for Developing a Sampling Plan 

This report will be used to discuss how to develop a statistically based sampling 
plan. It will consider the costs of sampling and the relative impact of sample size, how to 
quantify what is acceptable or rejectable quality, and the best way to tie pay to quality 
level. The pitfalls of not properly applying established and defensible sampling methods 
will be identified and how to avoid them. Also a straightforward explanation of the 
concepts behind different sampling procedures will be given, and when it 1s appropriate to 


use or avoid them. 





Chapter 2 


BACKGROUND 
The research for this project began with a literary search of all materials dealing 
with statistically based specifications relating to construction. It was soon discovered that 
the most relevant sources of information were Duncan, Montgomery, and MIL-STD-414. 
These were not necessarily construction oriented, but provided the background necessary 
for grasping the concepts inherent to statistical sampling. The sections that follow in this 


chapter provide the building blocks for understanding statistically based sampling. 


OC Curves and How They are Developed 

A properly designed acceptance plan, whether for variables sampling or attributes, 
can be represented by an operating characteristic (OC) curve. A variables sampling plan is 
one which tests and measures, anette characteristics of the item sampled. It bases the 
decision to accept or reject on one characteristic at a time, from data which are computed 
such as mean, standard deviation, or percent defective. An attributes sampling plan is a 
go/no go approach. In this method, several characteristics may be measured, but the final 
result is simply acceptance or rejection for the sample item. Attributes procedures tend to 
involve things that are counted. Both sampling methods will be described in more detail 
later. The OC Curve represents how well the sampling plan discriminates against a 
defective product. In other words, given a specific sample size, and material with a certain 
quality level (percent defective), then the probability of accepting the material from which 


the sample was taken, at that quality level, can be read directly from the curve. There is 
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only one curve for each sample size. The entire curve demonstrates how the probability of 


acceptance changes with either an increase or decrease in product quality for that 
particular sampling plan. An OC Curve that has been constructed properly has the ability 
to account for any uncertainty associated with the fact that only a small portion of each lot 
is sampled [Weed, 1995, p 2]. What is meant by a properly designed acceptance plan, 15 
simply one which was created following the guidelines in an accepted standard such as 
MIL-STD-414, “Sampling Procedures and Tables for Inspection by Variables for Percent 
Defective”, or MIL-STD-105, “Sampling Procedures and Tables for Inspection by 
Attributes”, or the principles of statistically based acceptance procedures outlined in most 
statistics books. There is enough flexibility built into these guidelines to be able to apply 
them in a wide variety of situations. Ifa sampling plan designer substantially departs from 
the guidelines indicated in the standards above, it may become impossible to accurately 
determine the plan’s discriminatory power, and the risks assigned to the contractor and 
owner. Many plans tie the quality of a product, determined by the sampling plan, to how 
much the contractor will be paid. Generally, for a plan to work properly, there should be 
a bonus for exceptional quality, and substantially reduced pay at the level of quality that is 
just above the level where it would be rejected. The quality levels which determine these 


points will be discussed later. 





a and B Risks 

The concept of the amount of risk assigned to each party in a contract is described 
by the quantities a and В. These terms are also known as Type I and Type II errors, or 
more meaningfully, as seller's and buyer's risk respectively. An a error is one in which a 
true hypothesis is rejected, and a B error is one in which a false hypothesis is accepted 
[Mahoney, 1993, p 26]. In terms relating to contractors and owners, a risk 1s the chance 
that an owner might reject material from a contractor that should be accepted (seller's 
risk), and p risk is the chance that an owner will accept material that should be rejected 
(buyer's risk). Unlike normal, binomial, hypergeometric, or other type of distribution 
curve, an operating characteristic curve does not represent probability by area under the 
curve. Instead the chance, or probability of acceptance, 1s the distance from the curve 
down to the X axis, read from the Y S So the Y axis will always be a scale of the 
probability of acceptance from 0.00 to 1.00, and the X axis will represent the quality of 
the material either in terms of how much is “good”, 1.e. percent within limits, or how much 
is “bad”, 1.e. percent defective or fraction defective. Taking this a step further, the 
distance from the curve upwards to 1.00, is the probability of rejection. Sometimes the Y 
axis is represented by the term 1-o. Therefore for an OC Curve representing a specific 
sample size from a body of material, or lot, with a specific percent defective, can be used 
to determine exactly how likely it is that it will be accepted or rejected. Figure 2-1 below 


is an example of a typical OC Curve. 





Operating Characteristic Curve 
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Figure 2-1 Typical OC Curve 


This OC Curve says that at 0.596 defective (or 0.005 fraction defective), the 
probability of acceptance 15 98.6%, at 2% defective the probability of acceptance 15 
67.7%, and at 7% defective the probability of acceptance 1s 2.6%. In an ideal sampling 
plan, a level of quality will ete been established that is “acceptable”. This will be at some 
point lower than perfect quality, because it is unreasonable to expect a contractor to be 
able to produce material completely free of defects. Recognizing this, an ideal OC Curve 
would accept material 100% of the time that is at or above the acceptable quality level. 
See Willenbrock Volume II for a more detailed discussion of the ideal OC Curve. The 
corresponding OC Curve would then reduce the probability of rejection to zero, or 
probability of acceptance to 100% for all material at or above the acceptable quality level. 
Likewise, the perfect sampling plan and OC Curve would reject everything below the 


acceptable quality level. Figure 2-2 below graphically demonstrates this concept where 


the acceptable quality level is 5% defective. This means that the contractor may provide 
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material that is up to, but no more than 5% defective with complete confidence that it will 


be accepted by the owner. Acceptable quality level, rejectable quality level, and zero 


defects will be discussed in more detail later. 


IDEAL O.C. CURVE 


For AQL of 5% 
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Figure 2-2 Ideal OC Curve 


Identifying the Correct Statistical Model 

Statistics describe different characteristics of naturally occurring data by first 
classifying them into a specific data distribution. Distribution curves may have different 
shapes, and will have different equations which describe their behavior. It is important 
that an acceptance plan designer understand enough about the process from which 
samples will be drawn so that the appropriate distribution is applied. Many times, 
simplifying assumptions are made that substitute one distribution for another, such as 
assuming that the data is normally distributed. As long as the plan designer understands 


when the disparity between “actual and assumed” are negligible can the substitution be 
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made with the confidence that it will not undermine the validity and enforceability of the 


plan. The following descriptions of binomial and hypergeometric distributions will help. 


Binomial Distributions 

The “normal” distribution, which is used more commonly than other distributions, 
represents continuous data. Discrete data is represented by the binomial and 
hypergeometric distributions, among others. Most field measurements are considered 
continuous limited only by the degree of precision of the instrument. These distributions 
are subsequently representative of the “pool” of data from which lots and samples are 
drawn. It is possible for a binomial and hypergeometric distribution to take on the exact 
same shape as a normal distribution, and in many cases is a close approximation. 
Normally distributed data 1s easier to manipulate, so making the assumption that the data 
is normally distributed 1s common. As has been the experience of those involved in 
construction, the vast majority of construction characteristics are in fact normally 
distributed, so this simplifying assumption 1s not a stretch of reality. Typically field data 
tends to not be normally distributed only when there is some sort of physical limitation 
such as zero percent air voids, or minimum cover over reinforcing steel. An example of 
discrete data and continuous data 1s included later. 

Normally distributed data comes from a universe that is infinite in size. A binomial 
distribution is the probability distribution for a continuous, or theoretically infinite process 
operating randomly over time. The random operation can be visualized as one which 
produces some product where on E say, 5% are defective. So if you were to draw 


lots from this process, each lot would on average have 5% defective. This is how a 
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consumer would view the operating characteristics of his sampling plan when buying a 


steady stream of material from a supplier [Duncan, 1986, pp 164, 165]. 


Type B OC Curves 
There are two categories of operating characteristic curves, Type A and B. Type 
B curves are built from probability of lot acceptance based on the binomial distribution. 


The formula for the binomial distribution is 


X Е п! ix m n-X | 
ң2) = X(n- X)* (1 p Equation 2-1 


where P(X/n) = probability of X nonconforming in a sample of n items 
X = number of items nonconforming in the sample 
n = sample size 
р’ = lot fraction defective 
[Duncan, 1986, pp 90-91] 
For instance, if the sample size is 10, and it is known that the lot has 5% of its items 
defective, and 3 of the 10 items sampled were found defective, Equation 2-1 would give 
the probability of finding those 3 defective items. As will be demonstrated later, it is the 


summation of probabilities from zero defective, up to the designer’s tolerance, that yields 


the probability of acceptance. 


Hypergeometric Distributions 

Unlike the binomial distribution, the hypergeometric distribution is much more 
limited in scope. The data it represents is assumed to have been drawn from a pool that is 
limited, or finite in size, and that the samples drawn from it are not replaced. This would 


be situations such as a one time product run, or an item that is manufactured between 





changes affecting production. This is also how a consumer would view the operating 
characteristics of a sampling plan when isolated lots of material are purchased, or when 
the consumer thinks about the quality of Individual lots [Duncan, 1986, p 165]. In this 
case, it might be appropriate to assume that the material produced by one job mix formula, 
JMF, from WSDOT'S specification could be described by the hypergeometric distribution. 
In reality though, WSDOT uses about 80 pounds of material for each test from a lot which 
may be thousands of tons. For all practical purposes this could safely be approximated by 


the normal distribution. 


Type A OC Curves 
Type A operating characteristic curves are based on the hypergeometric 


distribution. The formula for the hypergeometric distribution 1s 


: (N — m)! ٠ m! 
N-m yum _ | 0778 | | = | 
A=) O A SAS EN ы. 
H E au 
п (N — n)! 


where P(X/n) = probability of X nonconforming in a sample of n items 
X = number of items nonconforming in the sample 
N = lot size 
n = sample size 
m = lot fraction defective 
Cy = number of combinations of X out of m 


[Duncan, 1986, p 94] 


Because this formula is more difficult to manipulate, and that some calculators and 


spreadsheets are limited by the size factorial (e.g. 5! = 5*4*3*2*1=120) it can handle, it is 





desirable not to work with the hypergeometric distribution, especially since it was 
discovered that Microsoft’s’ Excel 5.0 was only capable of working with factorials up to 
170!. As will be shown later, the Type B OC Curve is still a good approximation in many 


circumstances. 


Effects of Large Sample Sizes on Type A OC Curves 

Previously it was shown that the hypergeometric distribution and binomial 
distribution are the basis for the probabilities of acceptance of Types A and B curves. For 
the most part, Type B curves are almost exclusively used in statistically based acceptance 
plans. The reason for this is that as the lot size increases, the lot has a diminishing impact 
on the behavior of the OC Curve. In fact the general “rule-of-thumb” is that if the lot size 
is at least ten times the size of the sample, the Type A and B curves are indistinguishable. 
The Type A curve will always be below the Type B curve, or rather, the probability of 
acceptance will always be lower for a Type A curve than for Type B. But as mentioned 
above, the difference is only significant if the lot size 1s small relative to the sample. 
Figure 2-3 below is used to demonstrate this difference, where N is the lot size, n is the 
sample size, and c Is the acceptance number [Montgomery, 1991, pp 562-563]. The 


acceptance number is the maximum number of defective items tolerable in one sample. 
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Figure 2-3 Type A vs Type B OC Curves [Montgomery, 1991, p 563] 


And finally to bring these different types of curves into perspective on how they 
are used; Type A curves are typically not used because the hypergeometric distribution is 
difficult to work with. Besides, most of the time a Type B curve will suffice because of 
the relative differences in size between the samples and the lots from which they came. So 


generally, Type B curves are used 1n both Type A and B situations. 


Discrete/Continuous Data 

As important as it is that a sampling plan designer understand the type of 
distribution which applies, is understanding whether the situation deals with discrete or 
continuous data. As mentioned earlier, hypergeometric and binomial distributions 
represent discrete data, while the normal distribution, represents continuous data. 
Discrete data are data that can assume only an integer value. Continuous data can assume 


a value between two limits, limited only by the precision of the instrument [Blank, 1980, p 





8]. An example of discrete data would be the number of marbles in a bucket, and an 
example of continuous data would be the number of minutes it takes to run a mile. The 
reason these distinctions are mentioned Is because most acceptance plans make the 
simplifying assumption that the data is normally distributed (1.e. continuous data), when in 
reality it may not be. The primary reason this assumption is made, 1s because the normal 
distribution is by far the easiest to manipulate, and for which probability tables are readily 
available. Plus it is also reasonable to expect to find that a binomial distribution has been 
substituted for a hypergeometric distribution (as a close approximation), and then that a 
normal distribution has been substituted for a binomial distribution, also as a close 
approximation. If these two successive substitutions are made, this results in a distribution 
that represents continuous data from an infinite universe being assumed as equivalent to a 
set of data that may be discrete and from a finite universe. It is only when a sampling plan 
designer recognizes these difficulties, that the appropriate model can be applied, or at least 


that assumptions can made that will not significantly affect the integrity of the plan. 


AQL/RQL 

As different organizations began developing statistical specifications, they quickly 
discovered that it was very difficult to define a single level of quality that clearly 
distinguished between acceptable and rejectable work. Instead it was much easier to 
define a range of quality where at the high end it was called an acceptable quality level, 
AQL, and at the low end, below which the quality was poor enough to reject it, the 
rejectable quality level, ROL. In between these two levels of quality, the work was 


considered to be poor enough to justify a pay reduction, but not so poor as to warrant 
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rejection or replacement [Weed, 1994, p 1]. This was the genesis of the concept of 


adjusted pay which provided a means to accept slightly defective work or material, for a 
reduced pay amount which was agreed upon in the contract documents. 

The next question then becomes, what are the appropriate sizes of buyer’s and 
seller’s risks? There are no hard and fast rules, but generally as a means to determine 
appropriate levels, the defects in question must first be classified. The following 
distinctions are made in “Statistically Oriented End-Result Specifications”, TRB, 1976: 


Critical: This defect will make the product dangerous to use 

Major: This defect will seriously impair performance of the item 

Minor: This defect may impair performance but not seriously 

Contractual: This defect is likely to have insignificant effect on 
performance 


MIL-STD-414 describes defects as follows 


A defect is a deviation of the unit of product from requirements of the 
specifications, drawings, purchase descriptions, and any changes thereto in 
the contract or order. Defects normally belong to one of the following 
classes, however defects may be placed in other classes: 

Critical Defects. A critical defect is one that judgment and 
experience indicate could result in hazardous or unsafe conditions for 
individuals using or maintaining the product: or, for major end items units 
of product, such as ships, aircraft, or tanks, a defect that could prevent 
performance of their tactical function. 

Major Defects. A major defect is a defect other than critical, that 
could result in failure, or materially reduce the usability of the unit of 
product for its intended purpose. 

Minor Defect. A minor defect is one that does not materially 
reduce the usability of the unit of product for its intended purpose, or 1s a 
departure from established standards having no significant bearing on the 
effective use or operation of the unit. 

[MIL-STD-414, 1957, p 1] 


Recognizing that a critical defect should have a much lower acceptable quality level 


than a minor one, assuming percent defective, MIL-STD-414 provides plans, and the OC 
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Curves which describe them, for AQL’s of 0.04%-15.0% which means 0.04%-15.0% of 


the characteristics tested can be defective, but still considered acceptable depending on the 
criticality of the characteristic. This is one reason why a variables inspection method, such 
as MIL-STD-414, requires a separate plan for each quality characteristic 1n question. 
Variables sampling plans will be discussed in more detail later, but generally, variables are 
quality characteristics that can be measured on a numerical scale, and attributes are quality 
characteristics that are expressed on a “go, no-go” basis [Montgomery, 1991, p 553]. 
Some sampling plans do not make distinctions between the varying levels of 
defects, and broadly assign a very typical value of 5% risk at the AQL to the contractor. 
Presumably this represents the risk that the plan designer wishes for all the quality 
characteristics being measured, which may not be appropriate. The problem is that for 
varying sample sizes it is very difficult to maintain control of the discriminating power of 


the sampling plan unless two points on the OC Curve are predetermined. 


Why a Zero AQL is not Practical 

As desirable as it may sound to have an acceptance procedure requiring 0% 
defective, in reality unless it represents a quality characteristic that could determine a life 
or death situation, it 1s not practical. In theory, the ideal OC Curve could be reached 
provided there is 100% error free inspection. It is clear that this level of inspection will be 
much more expensive than random sampling, and that all processes have some inherent 
variability making error free inspection unlikely. Figure 2-4 demonstrates the effect zero 


tolerance has on the shape of an OC Curve. 
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Figure 2-4 Effect of Zero Tolerance on OC Curves [Montgomery, 1991, p 564] 


Generally, sampling plans that have zero tolerance will be convex through their range. It 
can be readily seen that the probability of acceptance rapidly decreases for relatively small 


percents defective. This can be a severe consequence to the contractor and should be 


expected to be reflected in contractor bids [Montgomery, 1991, p 563]. 


Using AQL/RQL in Developing OC Curves 

As noted earlier, the OC Curve’s function is to demonstrate graphically the 
probability of accepting a product that is provided at a certain level of quality. OC Curves 
are generally designed so as to pass through, or very near two points that are important to 
the plan designer. The points which are easiest to quantify are those at the reyectable and 
acceptable quality levels where the plan designer wishes to assign specific risk based on 


the criticality of the characteristic. Theoretically any two points could be used, but usually 
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the designer begins with the desired a and [ risks at the AOL and RQL. Preferably a 


sufficient study of the characteristic to be measured should be conducted to ascertain what 
quality levels are appropriate. This means that the AQL and RQL should be realistic, as 
should the specification limits, and not just perpetuate limits used in previous acceptance 
procedures. If historical data is available, the FHA uses as a rule of thumb the deviation of 
the mean from the specification plus two standard deviations as specification limits. Ап 
example of perpetuating limits which are unnecessary might be a specification which 
requires the use of a high quality, expensive aggregate for a secondary road which could 
realistically be constructed with a local, cheaper aggregate with satisfactory results. 
Provided there is no data to support choosing a specific level, typically œ is 5% апа ф 15 
set at a minimum distance of 26 from the mean [TRB, 1976]. According to Willenbrock, 
for non-critical products, a and B are usually chosen as 0.5% and 10% respectively 
[Willenbrock, 1976, p20.33]. For a non-critical quality characteristic it is unusual for œ to 
be 0.5%, so perhaps that author meant 5% instead. If not, this demonstrates the 
variability in references for choosing buyer’s and seller’s risks. 

The contractor will always be concerned with the level of quality, or quantity of 
material allowed to be defective and still have a predetermined chance of having that 
material accepted. Or in other words, at the 95% probability of acceptance, the contractor 
might be interested in the corresponding percentage of defective material since this is 
typically where the AQL is set. It should be noted that the AQL is NOT a property of the 
acceptance plan. It is rather the lowest level of quality that the owner or buyer will accept 


as a process average. Also the AQL is NOT intended to be a specification or target value 
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for the contractor. It is instead simply a standard chosen by the owner to sentence the 


material being presented for inspection. OC Curves are designed so there is a high 
probability of acceptance at the AQL, and a low probability of acceptance at the RQL. 
Again, the RQL is not a characteristic of the sampling plan, but a standard by which the 


owner will judge poor material offered for inspection [Montgomery, 1991, pp 561-562]. 


Designing a Specified OC Curve 

Once a plan designer has determined the appropriate levels of risk for a certain 
quality characteristic at the acceptable and rejectable quality levels, those points can be 
used to design an OC Curve that passes through or close to them. Designing an OC 
Curve is the same thing as designing a sampling plan. For attributes sampling, given a 
sample size, and an acceptance number, it is possible to calculate the varying probabilities 
of acceptance using the binomial equation, Equation 2-1. For example, given a sample 
size n=89, and an acceptance number (the maximum number of defective items tolerable in 
a sample) c=2, then the probability of acceptance is the probability that d is less than or 
equal to c or 
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and for a lot fraction defective where p = 0.01, n= 89, and c = 2 then 
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[Montgomery, 1991, p 559] 
In other words, the probability of accepting a lot that is 0.01 fraction defective with a 
sample size of 89 and able to tolerate 2 defective in the sample, is the summation of the 
probabilities of 0 defective, 1 defective, and 2 defective. 

Table 2-1 below shows computed probabilities for fraction defective from 0.005 to 


0.090 for n=89 and c=2. 
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Sample as a Fixed Percentage of Lot Size 

Another potential problem besides setting AQL at zero is establishing a sample size 
as a fixed percentage of ние lot size. The problem with this 1s traced back to how an OC 
Curve behaves with varying sample sizes. It was stated previously that an OC Curve 
becomes more discriminating, or rather the slope steepens, with a larger sample size. In 
effect then, the level of protection afforded both the contractor and owner will vary 
depending on sample size [Montgomery, 1991, pp 564-565]. This is illustrated in Figure 
2-5. In this figure the lot sizes vary from 100 to 1000, and for each the sample is fixed at 


10% of the lot size with c=0, or zero AQL. 
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Figure 2-5 Sample as a Percentage of Lot Size [Montgomery, 1991, p 564] 


The resulting curve is more discriminating, or steeper for larger sample sizes, so although 
the intent may have been to simplify the sampling plan, the effect is a drastically changing 


level of protection for the contractor at small fractions defective which may not have been 
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intended. It should also be noted though, that the larger sample size gives a better 


“picture” of lot quality. 


Single and Double Specification Limit Plans 

It is important that a few additional sampling concepts and terminology be 
described. A sampling plan will fall into one of two categories; either a single 
specification limit, or a double specification limit plan. A single specification limit plan is 
one where the quality characteristic is compared to a single limiting value. In this case 
acceptance is based on whether the sample quality characteristic should be less than or 
equal, or greater than or equal to the specified value. For example, in an asphalt concrete 
pavement specification, the compaction requirement will be specified as greater than or 
equal to some minimum value. A manufacturer that produces plastic soda bottles might 
have a specification which has a minimum psi rating. These are both single specification 
limit plans. 

A double specification limit plan is used when the quality characteristic must fall 
within a range. The range is specified by a lower and upper limit. An asphalt concrete 
pavement specification again affords a good example where characteristics such as asphalt 
content, or gradation is specified as being acceptable as long as it falls between two limits. 

Single and double specification limit plans are not to be confused with single or 
double sampling. They sound similar, but have entirely different meanings. Single and 


double sampling will be discussed later. 
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Relationship to k and M Sampling Methods 


There are two methods used in statistical sampling plans. They are described as 
Form 1 and Form 2, Procedure 1 and Procedure 2, and as the k and M methods. The 
three designations are essentially identical, but are referenced by these different names 
depending on the publication. Here they will be referred to as the k and M method since 
that is how they are described in MIL-STD-414. The k method is essentially a distance 
test, and the M method is an area test. Using MIL-STD-414 procedures, a minimum 
distance, k, from the mean of the sample data to the lower specification limit (or upper 
specification limit) is obtained. If the sample data indicates that the distance from the 
sample mean is greater than k, the lot should be accepted. This concept is illustrated in 


Figure 2-6. 


Normal Distribution 
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Figure 2-6 Single (Lower) Specification Limit for a Normal Distribution 
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Since the total area under the curve represents 100% of the sample, it follows that 


the further the mean 15 from the specification limit, the less area, or less out-of- 
specification material, will be under the curve at the tail beyond the specification limit. 
The same ts true for an upper specification limit. 

On the other hand, the M method uses a maximum area under the tail(s) of the 
distribution marked by the upper and lower specification limits. In this method, MIL- 
STD-414 gives a maximum area (represented as a percentage) not to be exceeded. By use 
of Figure 2-7 1t can be seen that the shaded areas, together representing the maximum area 
not to be exceeded, M, can be achieved even if the mean of the distribution shifts slightly 
left and right. That is because this sampling procedure does not give a maximum or 


minimum value for the upper and lower limit tail areas, only a total area. 
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Figure 2-7 Double Specification Limit 


So a shift in the mean to the left will increase the amount of material falling 


outside the lower specification limit, and lower the amount falling outside the upper 
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specification limit. As long as the total area under the tails does not exceed M, the lot 


should be accepted. Further, Figure 2-6 and Figure 2-7 graphically demonstrate the 


concept of single and double specification limit plans. 


Single Sampling vs Other Methods 

Single sampling 1s one of several available in statistical acceptance procedures. A 
single sample does not imply a sample of one unit. A single sample could be one unit, or 
thousands, and Is usually referred to by the small letter “n”. An acceptance plan based on 
single sampling relies entirely on the integrity of the data obtained by observing the 
characteristics of that one sample. Single sampling is adequate for most situations. 

By contrast, double sampling is a procedure by which a second sample may be 
required before the lot can be sentenced. If it is found that the sample has more defective 
than that which would allow an unquestioned “pass” (similar to AQL), but less than that 
which would require outright rejection (similar to RQL), a second sample would be taken 
to determine if the combined percents defective from both samples was above or below 
the rejection limit [Montgomery, 1991, p 571]. 

A multiple sampling plan is an extension of the double sampling plan. This plan 
might involve more than two samples. If at any stage of the sampling the percent 
defective 1s less than the acceptance number (the maximum number of defectives tolerable 
in a sample), the lot is accepted. If at any stage the sample equals or exceeds the 


acceptance number, the next sample is taken. This procedure requires that a limit be 


placed on the maximum number of samples that may be taken [Montgomery, 1991, p578]. 
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Sequential sampling is an extension of both double and multiple sampling. This 


method requires a sequence of samples to be taken from the lot, the number of which is 
determined entirely by the results of the sampling process. Theoretically, this procedure 
could perpetuate itself until the entire lot was sampled [Montgomery, 1991, p 579]. 

The description of the three previous sampling procedures gives an indication of 
what 1s available, but by no means describes all acceptance procedures. Obviously the 
more sampling that is done, the more expensive it will be. The plan designer must make a 
decision early on to determine the cost trade off of multi-tiered sampling over the 
expected increase in confidence in the sampling results. Should the plan designer wish to 
pursue one of these alternate methods, it should be recognized that MIL-STD-414, 
inspection by variables, only offers single sampling procedures. MIL-STD-105, inspection 


by attributes, offers single, double, and multiple sampling. 


4 





Chapter 3 
QUANTIFYING RISK 

The focus of this research began by investigating the statistical risks, x and §, 
assigned to the asphalt paving contractors and the Washington State Department of 
Transportation, (WSDOT). Although many agencies have converted to QA 
specifications, there is a perception that a lack of understanding exists about the risks. 
The original goal was to build on the work contained in WA-RD 326.1 and “back out” the 
risks using data and quality/pay indexes in the report appendices. After concluding the 
research, however, it was discovered that the pay factor tables had been designed based on 
small sample theory, or t-distribution, which considers the skewing effects of relatively 
small samples. 

Secondary to determining risk, this paper is intended to be used, at least in part, as 
a tool to describe acceptance E risks in a way which will be easily understood by 
anyone with a technical background, but not necessarily versed in statistics. As such it 
will enable those in public agencies charged with designing and developing statistical 
acceptance plans to develop an awareness of some of those aspects which demand 
attention, such as potential pitfalls associated with an unclear understanding of operating 
characteristic curves. This section will examine WA-RD 326.1 for a determination of 


WSDOT?’s associated « and p risks. 





Normal/Hypergeometric/Binomial/t Distributions 

Although WA-RD 3206.1 specifically states that WSDOT's asphalt concrete 
specifications are based en MIL-STD-414 with modifications, it was still studied to 
determine to what degree this was true. MIL-STD-414 does not include, and 
understandably so, a discussion of hypergeometric, binomial, t, and normal distributions. 
It was written for those seeking an alternative to traditional non QA inspection methods, 
and relied on the agency using the standard to provide the expertise needed to determine 
when it was appropriate to apply. In other words, someone had to know whether the 
data collected by random sample was produced by a process which closely approximated a 
normal distribution. And if it did not, either use an alternative sampling method, or 
recognize that the results could not accurately be quantified by the OC Curves included in 
the standard. Although Duncan states that MIL-STD-414 can still be used in non-normal 
situations, the further the жеде from normal, the less confidence there is in the OC 
Curves which describe the acceptance plan’s behavior [Duncan, 1986, p 256]. Chapter 2 
described the relationship between normal sampling data, whether from a continuous 
process or single lot, and the resulting distribution; binomial or hypergeometric. 
WSDOT s 1994 specification states that: 

For the purpose of acceptance sampling and testing, a lot is defined as the 

total quantity of material or work produced for each job mix formula 

(JMF), placed and represented by randomly selected samples tested for 

acceptance [Standard, 1994, 5-04.3(8)A, p 5-22]. 


This potentially places the lot from which data is obtained by WSDOT's sampling 


method in the hypergeometric category since the material produced for one JMF is 
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a lot of finite size, and the sample material is not replaced. But despite this, at 


least as far as testing is concerned and the relative differences in sizes of the sample 
and lot, the nature and method of data collection correctly creates the presumption 


of a normal distribution. 


Type A/Type B OC Curves 

In many engineering circumstances, and applied statistics, there are simplifying 
assumptions made to make design and analysis manageable. That is the case with MIL- 
STD-414, and MIL-STD-105. Earlier it was demonstrated that Type A OC Curves 
represented sampling plans with data resulting from a finite universe, or hypergeometric 
distribution, and Type B OC Curves represented data from a continuous process, with 
sample data resulting from the binomial distribution. It was also demonstrated earlier that 
as the lot size increases, the acceptance plan approaches the Type B OC Curve. Only 
Type B OC Curves are found in both Military Standards. This is a safe approximation, 
that 1s to use Type B Curves, as long as the lot is at least ten times the sample size, and the 


sample is not small [Montgomery, 1991, p 562]. 


Nomographs 

It is possible to design an acceptance sampling plan with a specified OC Curve. 
Since the OC Curve simply represents the probability of acceptance over a range of quality 
from perfect to poor, then either the binomial or hypergeometric summation formulas, 
Equation 2-1 or Equation 2-2, are used. Only using the formulas is less than simple. They 


are tedious, time consuming, and must be repeated many times over since each OC Curve 
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represents only one sample size and one acceptance number for a desired a and B. Plus 
since the equations are nonlinear, there is no direct solution [Montgomery, 1991, p 565]. 
There is a simpler, though less accurate way which is to use either the binomial or 


hypergeometric nomographs. Figure 3-1 is used for attributes sampling plans, and Figure 


3-2 1s used for variables. 
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Figure 3-1 Binomial Nomograph-For Attributes Sampling Plans 


[Montgomery, 1991, p 566] 
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Figure 3-2 Hypergeometric Nomograph-For Variables Sampling Plans 
[Montgomery, 1991, p 627] 


To use the attributes nomograph, Figure 3-1, first a line is drawn from the fraction 
defective desired for the AQL on the left scale, to the desired probability of acceptance (1- 
o) at that fraction defective on the right scale, then another line from the fraction defective 
desired for the RQL on the left scale, to the desired probability of acceptance (8) at that 


fraction defective on the right scale. Note that for the AQL the probability of acceptance 
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is | minus the seller’s risk. Then trace from the intersection of these two lines curving up 


and to the right to read the required sample size, and then again from the intersection 
down to the right or up to the left to read the acceptance number (recall that the 
acceptance number is the maximum defectives tolerable in a sample). It 1s apparent that 
the intersection will not always land cleanly on the lines in the nomograph. That means 
there are several sampling plans available that will closely approximate the desired results 
[Montgomery, 1991, pp 565-566]. 

Figure 3-2 by itself, works only for sampling plans using the k method. The 
procedures for using this nomograph are exactly the same as for Figure 3-1, only instead 
of reading an acceptance number, it gives a minimum value for k. Note that two values 
for the sample size, n, can be read from this nomograph. Reading down, n is given for 
situations where the process standard deviation (o) is known, and reading up, for when it 
is not known [Montgomery, 1991, pp 626-628]. When reading for unknown standard 
deviation, trace upward from the intersection following the curved lines of the nomograph. 
If the standard deviation is known, the sample size is read directly, and vertically, below 
the intersection; do not follow the curved lines of the nomograph or the results will be the 
same as if reading up. As might be expected, when the standard deviation, o, 1s not 
known, there is greater uncertainty which requires a larger sample size for the same level 
of confidence. After the sample is taken, the mean and standard deviation are calculated, 
and are then used to determine Z. Or rather Z is calculated using Equation 3-1, in this 


case where there is a single lower specification limit. 





C SL Equation 3-1 


LSL 





O 


where: Z is a standard normal deviate 
x = sample mean 
LSL = lower specification limit 
o = sample standard deviation 


If Z is > k, then the lot is accepted. 


34 


It is important to note that for either nomograph, given a sampling plan, meaning a 


sample size n, and either acceptance number, c, or minimum k, the probability of 


acceptance for any fraction defective can be read directly from the nomograph. 


k vs M Method 


Figure 3-2 can be used for the M method, but requires additional steps. First for a 


case involving a single specification limit, n and k are determined from the nomograph as 


before. Figure 3-3 is then used to convert the value k into a value M. 
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For standard deviation plans take abscissa = 
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Figure 3-3 Converting k to M [Montgomery, 1991, p 629] 
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Figure 3-3 is entered from the X axis, but to obtain that value it must first be 


calculated from Equation 3-2 . 





= куп. Equation 3-2 
(п = 1) 
бе _— 
2 


where: X = abscissa on Figure 3-3 
k = value obtained from Figure 3-2 
n = sample size obtained from Figure 3-2 
Reading up from the X axis to the Intersection with the sample size, M is then read 
horizontally from the Y axis. As long as the fraction defective is < M, the lot Is accepted. 


But to figure the fraction defective in terms of Z requires yet another step. Here Z, 


calculated from Equation 3-1 is used to enter Figure 3-4. 
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Figure 3-4 Fraction Defective from Z [Montgomery, 1991, p 629] 


The fraction defective is then read horizontally on the left vertical axis from the 


intersection of Z read from the right vertical axis and n up from the horizontal axis. 
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Again, as long as the fraction defective is < M, the lot is accepted. If Figure 3-2 is to be 


used for a double specification limit, M method plan, k and n are found as before, then M 
from Figure 3-3 and Zs, and Zus_ from Equation 3-1. Both Z's are converted to fraction 
defective from Figure 3-4 and added together. If the two added together are < M, the lot 
is accepted [Montgomery, 1991, pp 628-630]. Note that it is possible for 41 51, апа Zus, to 


vary, and still result in the same fraction defective. 


MIL-STD 414 

Following World War II, the Department of Defense began to consolidate the 
sampling plans that had been developed during the war. MIL-STD-414, acceptance 
sampling by variables, was introduced in 1957 as an alternative to MIL-STD-105, 
acceptance sampling by attributes [Duncan, 1986, p 290], [Montgomery, 1991, p 630]. It 
was originally intended for use in Government procurement, supply and storage, and 
maintenance inspection operations where a single quality characteristic can be measured. 
The standard is set up for expressing quality in terms of percent defective, but can be 
easily modified for just the opposite, which would be percent within limits. The 
underlying assumption in developing this plan was that the single quality characteristic 
measured in a random sample is normally distributed. MIL-STD-414 can still be used in 
nonnormal situations, however the risks involved will be different than those indicated on 


the operating characteristic curves included in the standard [Duncan, 1986, p 301]. 





Advantages/Considerations 

Compared to attributes sampling plans, like MIL-STD-105, the advantage 1s that 
smaller sample sizes can be used for the same level of confidence. Of course the trade off 
is that for any given sampling plan, it will likely be more expensive to quantitatively 
measure a single characteristic against a standard, rather than determine a simple pass or 
fail as in the attributes sampling plan. Therefore early in developing a sampling plan, a 
quantitative decision must be made to determine which approach 1s more cost effective; 
relatively small samples and meticulous measurements, or large samples with simple 
pass/fail measurements. 

In addition, if a standard such as MIL-STD-414 is indiscriminately used, and 
applied in a situation where the data is not normally distributed, the result will be an 
inability to accurately predict the risks of accepting a product at varying levels of process 
quality. In other words, the operating characteristic curves included in the standard begin 
to lose applicability with increased skewness or kurtosis. Skewness is a measure of how 
equally distributed data is around the mean, and kurtosis 1s a measure of peakedness 


[Blank, 1980, pp 67, 70]. 


MIL-STD 414 Sections 

MIL-STD-414 is divided into four sections, A through D. Section A gives a 
general description of terms used, a method for classifying defects as “Critical, Major, or 
Minor”, the range of acceptable quality levels used in the standard (0.04-15%), 


acceptability criterion, and sample selection [MIL-STD-414, pp 1-3]. Section B covers 
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sampling plans where the variability of the process is unknown, and the standard deviation 


is used to determine percent defective. This is done through one of two methods, Form | 
or Form 2, which will be described shortly. Section C covers sampling plans where the 
variability is unknown, but uses a range method in lieu of standard deviation to determine 
percent defective. The range method is not commonly used today because the results of 
this process are not as meaningful as when using the standard deviation method. The 
process was developed because it is mathematically easier to manipulate. With today’s 
calculators and computers there remains little justification for using this method. Section 
D is used when the process variability is known. The advantage here is that if the process 
with its inherent variability is known well enough, smaller sample sizes can be used to 
determine lot quality with the same confidence of the plans in sections B and C. This 
results in even cheaper sampling. 

All methods, B through D, provide for sampling plans based on single, and double 
specification limits. A single specification limit would be a plan with the criterion that the 
sample would be either <, or > а single value such as for checking compaction of asphalt 
concrete. A double specification limit is used when a range of values is acceptable, but 
falling below or above that range is not, as in specifying an asphalt content. 

In sections B, C, and D, a choice of either Form 1 or Form 2 is available depending 
on the circumstances. Other statistics references, such as Montgomery and Duncan, 
which discuss acceptance sampling procedures, describe these approaches as k and M 
methods, or Procedures 1 and 2. Form 1, Procedure 1, and the k method are the same. 


This technique works by specifying a minimum distance, k, from the mean of the sample 
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data to the value at which the plan wishes to reject material (single specification limit) in 


terms of a number of standard deviations. If that number is greater than k, the lot ıs 
accepted. Form 2, Procedure 2, and the M method are also the same. In this technique, 
instead of specifying a distance from the mean, a maximum area under the normally 
distributed curve is not to be exceeded. It can be readily seen that the M method can be 
used for either single specification or double specification limit plans, whereas the k 


method is suited only for single specification limit plans. 


Methods 

After determining the acceptable quality level (AQL) for the characteristic to be 
measured, and how large the lot will be, Tables A-1 and A-2 in MIL-STD-414 can then be 
used to determine a specific AQL, and sample size code letter. With these two pieces of 
information, the tables in Sections B, C, and D can be accessed for sample size and 
acceptance numbers. The acceptance criterion will either be a distance (k) from the mean, 
or an area expressed as a percentage (M) as described above. Unless circumstances 
dictate otherwise, normal inspection is always used first. The plan allows for normal, 
tightened, and reduced inspection. It should be noted that if MIL-STD-414 is to be used 
at all, it should be used as closely as possible to the way it was intended. The reason is 
that confidence in the tables and operating characteristic curves is diminished, or 
eliminated when the plan is not used as intended. In other words, if reduced sampling is 
indicated, there should be reduced sampling. The opposite holds true also. If there is 
more than one quality characteristic, there should be a comparable number of inspection 


plans. 
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As mentioned earlier, the primary advantage of a variables sampling plan is that it 


requires a smaller sample than an attributes plan for the same operating characteristic 
curve. One of the major disadvantages is that it is necessary to have a separate plan for 
each quality characteristic inspected. For example, if an item were inspected for three 


quality characteristics, it would require three separate variables inspection plans. 


Impact of Very Low AQL’s 

Another primary disadvantage of variables sampling plans is that if the process is 
nonnormal, and the sample size is very small, the probability computations could be 
seriously affected. All variables sampling plans use the mean and standard deviation to 
estimate the fraction nonconforming. It is readily seen by studying Figure 2-6 and Figure 
2-7, that if the AQL is very small, it would be relatively far out into the tail(s) of the 
distribution. If the distribution is nonnormal, i.e. peaked or skewed, the effects would be 
more noticeable on the tails. An example from Duncan may illustrate this more clearly: 

„ЛҒ the mean of a normal process or lot lies three standard 

deviations below a single upper specification limit, it will have no more 

than 0.00135 nonconforming. On the other hand, if in a nonnormal process 

or lot with considerable skewness and/or kurtosis say with y;—1.00 and 

y2=1.5, the mean lies three standard deviations below the specification 

limit, possibly 0.01000 of the items might be nonconforming or seven times 

that for a normal distribution [Duncan, 1986, p 256]. 


Figure 3-5 below shows how nonnormal distributions affect the tail area as compared to 


normal. 
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Figure 3-5 Effect of Small AQL on Distribution Tail Area [Duncan, 1986, p 256] 


MIL-STD 105 

The focus of this report centers on variables sampling, and therefore plans similar 
to MIL-STD-414. The discussion would not be complete, however, without a brief 
description of MIL-STD-105, attributes sampling. It too was developed during World 
War II, and the first version, MIL-STD-105A was issued in 1950 [Montgomery, 1991, p 
585]. The latest version is MIL-STD-105E. It is a collection of sampling schemes 
including single sampling, double sampling, and multiple sampling. For each of these 
schemes, there are provisions for normal, tightened and reduced inspection. If the plan is 
a percent defective plan, the AQL’s range from 0.10% to 10%. If instead it is a defects 
per unit plan, there are ten different AQL’s up to 1000 defects per 100 units 
[Montgomery, 1991, p 586] [MIL-STD-105]. 

As in MIL-STD-414, sample size 1s determined by lot size and the level of 


inspection. In addition there are four special inspection levels, S1 through S4 that are 
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used for very small samples, but only In cases where large risks can be tolerated 


[Montgomery, 1991, p 603] IMIL-STD-105]. 


Advantages 
Although subject to some controversy, as the sample size increases, the probability 
of acceptance goes up for AQL work. The effect is that there is less chance of rejecting a 


large lot, and produces a steeper OC Curve [Montgomery, 1991, p 607]. 


Disadvantages 

The standard emphasizes only the producer’s risk end of the OC Curve. The only 
way to control the discriminatory power of the curve is by choosing sample size, and not 
all sample sizes are available for use. As the lot size increases, so does the sample size, 
but at a decreasing rate after n=80 [Montgomery, 1991, p 605]. 

Generally, larger sample sizes are required for the same level of confidence for 


attributes sampling plans as compared to variables sampling plans. 


Developing Formulas for Pay Curves 

Chapter 4, describes the two common methods for employing an adjusted pay 
schedule for a statistical specification. Obviously a pay formula could be any number or 
types of equations. Its intent is to provide a smooth transition from bonus pay at superior 
quality, down to substantially reduced pay at the rejectable quality level. If sufficient study 
has been made in preparing a statistical specification, the designer will have a good feel for 
the needed pay reductions at lower quality levels to cover the costs of earlier than 


programined repairs. Presuming those costs have been quantified to reflect a withheld 








amount for a specific level of quality, it would be straightforward to plot the data and 
graphically examine the pay trend from superior to poor quality. It might be expected that 
the curve drawn through the points on this graph would mirror the OC Curve. After all, 
the OC Curve represents the discriminating power of the acceptance plan, so the pay 
curve would reflect varying pay factors over the same range of quality. This, however, is 
not the case with the pay factor tables found in the FP-85 “Standard Specifications for 
Construction of Roads and Bridges on Federal Highway Projects”, and WSDOT’s 1994 
Standard Specification. If the “steps” are ignored, these pay tables show a general 


downward curve from bonus to rejection, with Figure 3-6 illustrating this process. 


Stepped Table Values 
For Sample Size n=9 
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Figure 3-6 Stepped Pay Factors 


The larger the sample, the "flatter" the curve becomes, eventually coming close to a 
straight line from bonus to rejection for large sample sizes. But the "steps" cannot be 


ignored since they demonstrate a critical aspect of how the pay tables are implemented. 
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Why the Nonlinear Formula was Used 

Even if a specification using a pay table similar to the ones in the FP-85, or 
WSDOT Standard Specification has appeared to perform as it was intended, it may still be 
desirable to replace the tables with a pay formula. This is mainly because a plan designer, 
or contract administrator may wish to do away with potential conflicts with the 
contractors over missing a higher pay increment because of the “steps”. It should be 
noted that the pay steps in WSDOT’s specification are relatively small and therefore may 
not be as likely to create conflict than if the pay steps were large. 

Weed, of the New Jersey Department of Transportation has developed a program 
called OCPLOT which is designed as a tool for plan designers to do a “what if” analysis 
on their pay formulas to see if it will perform as desired. The limitations are that only two 
general pay equation formats, called linear and nonlinear in the program, are available for 


a 


analysis. The linear equation is PF — A— B(PD) , where PF is the pay factor, A is the 
bonus that would be paid when there are zero defects, B is a constant, and PD is the 
percent defective. The nonlinear equation is PF — A — B(PD) , where B and C are both 


constants. It should be noted that OCPLOT also offers to the user the opportunity to use 
the equivalent percent within limits formulas instead of percent defective. Because 
OCPLOT was used to evaluate the pay factor tables in both the FP-85, and WSDOT's 
1994 Standard Specification, only the nonlinear equation was used to approximate what 
an alternative pay formula curve would look like compared to the stepped values. 
Appendix C shows the behavior of the pay tables compared to the curve generated from 


the nonlinear pay equation used in OCPLOT. The nonlinear formulas were developed by 
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using 105 for A, since perfect quality, or zero defects, has a pay factor of 1.05. Next the 


quality level at a pay factor of 1.00 and 0.75, was read from the table and subtracted from 
100 for percent defective. ‘This resulted in two equations and two unknowns. From here, 
algebra was used to determine B and C for each sample size. As evidenced by Appendix 
C, the nonlinear formula 1s a good approximation of the stepped table values. The pay 
formula derivations are included in Appendix B. OCPLOT will be discussed in more 


detail later. 





Chapter 4 
WSDOT SAMPLING PLAN 
WSDOT’s asphalt concrete QA specification is similar to MIL-STD-414 in that it 
uses the variability unknown, double specification limit, standard deviation method, for 
estimating lot quality. Randomly selected sample data is used to compute the mean and 
standard deviation, then Quality Indexes are computed for entering Table 1 on page 1-34 
in the 1994 Standard Specification, to determine percents within upper and lower 
specification limits. For a double specification limit plan, the percents within limits are 
added together, and then 100 is subtracted. The resulting quality level is then used to 
determine a final pay factor which is subsequently used in formulas for the job mix 
compliance incentive factor, and compaction incentive price adjustment factor. These 
factors are ultimately used to calculate the final adjustment to the contractor’s bid price 
per ton of asphalt concrete. Refer to WA-RD 326.1, pages 8-15 for an example [Markey, 


et. al., 1994, pp 8-15]. 


Departure from MIL-STD 414 

Beyond calculating mean, standard deviation, and quality indexes, the WSDOT 
Standard Specification departs from MIL-STD-414. Table 1 of the 1994 Standard 
Specification is the WSDOT equivalent of MIL-STD-414’s Table B-5. The fact that 
WSDOT uses percent within limits instead of MIL-STD-414’s percent defective is not 
significant. Of limited significance is that WSDOT Table | and MIL-STD-414 Table B-5 


do not use the same sample size categories, but as was mentioned in the introduction, the 
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specification designers’ intent was not to exactly mirror MIL-STD-414. The resulting 


estimates of percents defective for some sample sizes will then invariably be different than 
if MIL-STD-414 were used over WSDOT’s Standard Specification. 

MIL-STD-414 uses a two step process to determine the appropriate sample size 
for alot. First Table A-2 is entered using the lot size. Assuming “normal” inspection, 
which 1s inspection level IV, the table gives a sample size code letter. Then for double 
specification limit, normal or tightened inspection, Table B-3 is entered for the appropriate 
sample size and maximum percent defective, M, for the chosen AQL. If sampling other 
than “normal” is needed, Table A-2 gives different sample size code letters, and Table B-3 
can be read from the bottom for tightened inspection. Here it is important to note that 
MIL-STD-414 1s very specific about the sample size needed for a given lot. 

By contrast, WSDOT’s Standard Specification typically results in at least 5 sublots 
of about 500 tons each for a minimum of 5 samples. Specifically it states: 


..For the purpose of acceptance sampling and testing, a lot is 
defined as the total quantity of material or work produced for each Job mix 
formula (JMF), placed and represented by randomly selected samples 
tested for acceptance. All of the test results obtained from the acceptance 
samples shall be evaluated collectively and shall constitute a lot. Only one 
lot per JMF will be expected to occur... 

...The quantity represented by each sample will constitute a sublot. 
Sampling and testing for statistical acceptance shall be performed on a 
random basis at the frequency of one sample per sublot, with a minimum of 
five sublots per class of mix. Sublot size shall be determined to the nearest 
100 tons to provide not less than five uniform sized sublots, based on 
proposal quantities, with a maximum sublot of 800 tons. 

Sampling and testing for nonstatistical acceptance shall be 
performed on a random basis at a minimum frequency of one sample for 
each sublot of 400 tons or each day's production, whichever is least 
[Standard, 1994, 5-04.3(8)A, p 5-22] 
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This demonstrates the difference between MIL-STD-414’s scheme for determining 


sample size and WSDOT’s Standard Specification. There is no clear link between the two 
standards for determining the number of samples needed per lot because WSDOT”s 
sampling methodology is based on the non-central t-distribution which takes into account 


the effects of small sample sizes. 


Pay Factor Tables/Curves 

Earlier the concept of accepting material or work that was below the acceptable 
quality level, but above the rejectable quality level, was introduced. The idea 1s that since 
the material is not up to the minimum level of quality, but better than rejectable, it should 
receive less pay. The amount of that reduction is based on the amortized value of work 
which will be needed to repair or replace the defective material at some point earlier than 
if it had been of better quality. Orin other words, the necessity for repair or replacement 
has a real cost associated with an earlier than programmed maintenance schedule. The 
amount withheld from the contractor is theoretically set aside to cover the costs of the 
expected premature replacements or repairs. This assumes that the costs associated with 
poorer quality material or work has been quantified sufficiently to make equitable and 


realistic adjustments to a contractor's pay. 


Description 
There are two general approaches to implementing a pay scheme that will pay a 
bonus for superior quality work, 100% at the acceptable quality level, and reduced pay 


down to the rejectable quality level. They are by using either a pay factor table, or a pay 
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formula. Table 4-1 is a sample taken from the FP-85, that is duplicated in WSDOT's 


1994 Standard Specification. 


Potential Problems 

The pay factor tables in the FP-85 and WSDOT Standard Specification have a 
footnote indicating that if the computed quality level does not exactly match the value in 
the table for a given sample size, then the next lower pay factor should be used. Figure 3- 
6 demonstrates graphically how these "stepped" pay functions work. An alternative to 


such tables is presented below. 


Alternatives 

The alternative, using a pay formula instead of a table such as Table 4-1, is a 
formula which makes a smooth progression from bonus pay for superior work, to 
substantially reduced pay at the ROL. Presumably for the pay curve to operate properly, 
it must pass through, or very close to 1.00 at the AQL, and the lowest pay factor 
allowable under contract at the ROL. As will be demonstrated later, the pay scheme must 
allow for a bonus as well as reduced pay for it to operate properly. The slope of the line 
should match as closely as possible the reductions in pay that are needed at lower quality 
levels to sufficiently cover the costs of future repair or replacement as mentioned above. 
The added advantage of a pay formula over pay tables, 1s that there can be no dispute over 
a higher pay factor that might have been missed by only a few hundredths of a point. And 
there are no steps to dispute because the pay factor equation simply indicates a point on 


the pay curve somewhere between perfect and rejectable quality. 
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WSDOT 1994 Standard Specification Issues 

Although WSDOT’s Standard Specification text does not match exactly the FP- 
85, the acceptance el are essentially the same since calculating mean, standard 
deviation, and quality indexes is the same, and quality level and pay factor tables are 
identical. MIL-STD-414 uses mean, standard deviation, and quality indexes to estimate 
lot quality, which 1s an accepted standard. The potential problems arise from the FP-85, 
which WSDOT uses as a source, where it uses a misleading statement relating quality 
level, pay factors, and their relationship to risk. It is incorrect to assume that acceptance 
plan OC Curves, and pay curves are the same thing unless specifically linked as in 
OCPLOT’s computer simulation, or NONCENTT [Barros, 1982]. Pay curves and OC 
Curves represent two very distinct, and different aspects of QA methodology. This will be 


explained in further detail below. 


a 


Excerpt from FP-85 
The FP-85 makes the following statement in describing acceptance plan behavior, 
risk, and pay factors: 


Quality Level Analysis is a statistical procedure for estimating the percent 
compliance to a specification and is affected by shifts in the arithmetic 
mean ( X ) and by the sample standard deviation (s). Analysis of each test 
parameter will be based on an Acceptable Quality Level (AQL) of 95.0 and 
a producer's risk of 0.05. AQL may be viewed as the lowest percent of 
specification material that is acceptable as a process average. The 
producer's risk is the probability that when the Contractor is producing 
material exactly at the AQL, the materials will receive less than a 1.00 pay 
factor [FP-85, 1985, p 46]. 





In 
һә 


The fact that the AQL is 95% and producer’s risk, a, ıs 5% may be misleading 
without additional explanation. That the two in this case happen to add up to 100% is 
coincidental, and it should not be assumed that this is a normal aspect to acceptance 
sampling. As an example, it would be just as correct to say the AQL 1s 5% defective, and 
that the producers risk 1s 5%, which do not add to 100%. 

The FP-85 is also making a sweeping assignment of a risk for all quality 
characteristics the standard may be used to examine but no specific statement is made 
concerning the consumer, or f), risk. In other words, according to the statement above, it 
Is assigning a single level of risk to all characteristics being measured regardless of 
criticality. As was mentioned earlier, at least two points on the OC Curve are needed to 
design the plan. If only the a risk is specified, there is no way to “nail down” the other 
end of the OC Curve at the RQL for a given sample size given the methods and 
explanations of statistical sampling thus far. NONCENTT will allow “nailing down” the 
RQL end of the curve provided sample size is allowed to adjust, or “float” accordingly. 
The FP-85 also does not mention when it would be appropriate to vary the o and f) risks 
depending on the criticality of the characteristic. For example, is it more important to 
control fines passing the Number 200 sieve more closely than those passing the Number 8 
sieve? 

The excerpt above also implies that the contractor is to use the 95% AQL asa 
specification standard; however, the contractor should apply whatever quality control is 


necessary and economical to his process to maximize pay. This should assure that the 
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quality will be equal or better than the AQL. The AQL is simply a tool for the owner, not 


the contractor, to make an informed decision concerning lot acceptance. 

The last sentence in the excerpt implies a situation the plan designers most 
certainly do not want. That the contractor suffers a 5% risk of receiving a pay factor of 
less than 1.00 if he is producing exactly at the AQL is also saying that 95% of all pay 
factors assigned to the contractor have to be greater than 1.00. This is illustrated in 


Figure 4-1 below. 


FP-85 Plan 





Figure 4-1 Graphical Representation of 5% Risk of Pay Factor < 1.00 


Despite the unwanted situation where 95% of the pay factors will be greater than 1.00 for 
work at the AQL, is that it is very difficult to determine what the average pay factor ends 
up actually being. Weed has not only noted this problem, but has developed a program, 
(OCPLOT-described earlier), that will quantify it. After inputting certain plan parameters, 


including the pay formula, the program enables the plan designer to determine what the 
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average pay factor will be over a range of quality including the plan’s AQL. After 


manipulating the information from the FP-85 and WSDOT’s pay factor tables, OCPLOT 
determined that the pay tables pay a bonus at all sample sızes, up to and over 104%, for 
material at the AQL. WSDOT believes that because of this, the contractors factor the 
bonus into their bids, thereby holding contract bid prices either relatively flat or slightly 
lower over time. Presumably what the plan designers had really intended 1s illustrated in 
Figure 4-2, that if the process is exactly at the AQL, the pay factor should instead be 1.00, 


and that the contractor is at 5% risk that the AQL work might be rejected. 


Process Pay Factor at AQL 





Figure 4-2 Process Exactly at AQL, therefore Pay Factor = 1.00 


The plan designer must recognize that for a plan to operate properly, it will be paying a 
bonus 50% of the time, and a penalty 50% of the time when the process is operating 
exactly at the AQL. It is not possible to pay a pay factor of 1.00 on average, at the AQL 


under any other circumstances. This is illustrated in Figure 4-3. 

















[50% >| |PF =1.00] —> 


Figure 4-3 Reality of Pay Factors at AQL 


The acceptance plan must pay a bonus for superior quality work for it to operate 
properly. It is not reasonable to expect a contractor to be able to produce completely 
defect free material and work, nor is it possible for a sampling plan to never make errors in 
determining lot quality. Recognizing now how a contractor’s process can be visualized, 


Figure 4-4 demonstrates what happens when there is no bonus provision. 


No Bonus Provision 

















Figure 4-4 Results of No Bonus Provision 
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This plan would result in the contractor being unduly penalized since his process would on 


average pay less than 1.00 for work at the AQL. Again, OCPLOT is a good tool to 
determine what the average pay factor would be at the AQL. 

Note that Figure 4-2 represents two related, but very different aspects of 
acceptance sampling: The contractor's process, and the pay as a result of the sample 
quality. Also it should be noted that the OC Curve and the pay curve are two related but 
very different aspects of the acceptance plan. The OC Curve is a graphical representation 
of the discriminating power of the sampling plan. The pay curve represents the 
progression of pay factors from a bonus at superior quality, to 1.00 at the AQL, to 
reduced pay down to the ROL. The pay curve cannot be used to determine the 


discriminating power of the acceptance plan, nor can it be used to determine risk. 


Flatter (Less Discriminating) OC Curves 

It should be noted again that the discriminating power of the acceptance plan is 
markedly affected by the sample size. As the sample size increases, the discriminating 
power goes up. In other words, the larger the sample, the greater the confidence in its 
results, which means the probability of accepting substandard material should decrease. 
For relatively smaller sample sizes, the OC Curve will be flatter over the quality spectrum 
which means that the chance that good material might be rejected is held to a minimum. 


This is illustrated in Figure 4-5 below. 





Effects of Sample Size 
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Figure 4-5 Effect of Sample Size on the Discriminating Power of the Plan 


Although the curves in Figure 4-5 were created using formulas for an attributes plan, the 
concept Is the same for variables. One of the biggest advantages of a л plan is that 
smaller sample sizes can be used for the same level of confidence. It should be noted 
however, that WSDOT's Standard Specification allows for sample sizes as small as “5 


sublots per class of mix.” 





Chapter 5 
New Jersey DOT QA Research and OCPLOT 

The New Jersey Department of Transportation began implementing statistical 
quality assurance specifications in the late 1960’s. A brief overview of how their program 
evolved provides insight into developing a QA specification, and why OCPLOT was 
developed as a tool to that end. 

A better method for describing quality was desired, and with most new procedures 
has been an evolving process ever since. The concepts of acceptable and unacceptable 
work were expressed in terms of the average value in relation to the specification limits. 
However, this method did not take variability into account and it was soon discovered that 
on average the material could be judged acceptable even though a substantial amount was 
out of specification. Here the NJDOT realized the importance of controlling variability, 
thus implementing specifications based on the variables sampling procedures described in 
MIL-STD-414. The added bonus was that this was a more efficient procedure requiring 
fewer samples. It was also found that pay equations had benefits over pay tables in that 
disputes were avoided by eliminating pay “steps”. More recently it was discovered that 
for an acceptance procedure to operate fairly, it would have to provide a bonus provision 
for reasons cited earlier. In many cases a linear pay equation was sufficient, but to provide 
adequately low pay at the AQL sometimes also required the bonus to be unusually high. 
OCPLOT proved to be a valuable tool in developing the DOT’s new specifications. To 
allow the contracting community to become familiar with the new specifications, New 


Jersey has implemented a policy whereby pay factor deductions are reduced by one half as 
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the specification is phased in [Weed, 1994]. Weed, 1994, “Development of Air Voids 


Specifications for Bituminous Concrete” is an excellent example of QA specifications 
evolution, and methods for implementation. 

OCPLOT’s approach is to use computer simulation to estimate lot quality. It is 
important to note here that this is the only approach besides NONCENTT which makes it 
possible to directly relate the acceptance plan performance and the resulting pay factors 
generated for varying levels of lot quality. 

The first menu following the introductory screens allows the user to input the 
various features and parameters of the acceptance plan. This includes, but is not limited to 
whether it is pass/fail or pay adjustment, single or double specification limit, what is the 
desired pay equation, AQL, RQL, and sample size. The menu items appear in a logical 
sequence and build upon one another depending on the plan. In other words, if an 
attributes plan or linear pay equation was desired, a different set of questions would have 
followed. After the plan parameters have been typed in, the user must select a level of 
precision desired for the simulation. The simulation process is very computationally 
intensive, so depending on what type of computer is used makes a difference for which 
level is selected. A 386 SX-20 laptop computer was used at all three levels. The high 
precision level took an unacceptably long time to generate all the data sets. A desktop 90 
MHz Pentium was also tried, and found that it was fast enough that it did not matter 
which level was selected. The way the simulation works is that a series of samples of the 
size designated by the user in the preliminary screen are taken from a randomly generated 


universe of normally distributed data at each level of quality from the AQL down to the 
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RQL in increments of approximately 5%. The low precision method generates 200 sample 


sets, the intermediate level generates 1000 sets, and the high precision level generates 
5000 sample sets. For example, if a user had specified a sample size of 5, and had chosen 
the low level of precision for faster execution, and had specified an AQL of 5% defective 
and RQL of 25% defective, the simulation would produce 200 randomly selected sample 
sets of 5 each from a population that was 5% defective, then 10% defective, and so on 


down to 25% defective. This is illustrated in Figure 5-1 below. 














25% Defective 
Universe 





Figure 5-1 OCPLOT Simulation 
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Each sample of 5 would then be analyzed and the results compared to the boundaries 


specified in the preliminary screen. This way a “tally” can be made of the number of 
“items” that fall within each quality level increment between perfect, down to and beyond 
the RQL. The results are averaged and reveal the expected performance at the AQL and 
RQL. The corresponding pay factors are matched to these levels of expected 
performance. This way an average pay factor can be determined for each level of 
population quality. Since the plan designer 1s generally interested in what happens at the 
AQL and RQL, separate screens, among other things, give detailed information showing 
performance, and pay factor histograms, and an operating characteristic curve showing the 
relationship between quality and expected pay factor. It is important to note that this 
operating characteristic curve cannot be equated with those discussed in the rest of this 
paper. The reason this curve can be called an OC Curve even though it shows expected 
pay instead of probability of acceptance, is because the simulation directly links plan 


performance and corresponding pay at that performance level. 


Uses for Developing Acceptance Plans 

OCPLOT is a powerful tool available for both plan designers and administrators. 
It allows not only the opportunity to predict how well a plan will perform with respect to 
pay and quality level, but it also allows a straightforward analysis of existing plans. This is 
how the FP-85 and WSDOT’s pay factor tables were analyzed. 

The program does not allow the user to directly assign a desired a and Û risk. The 


only way the OC Curve can be manipulated is through the pay equation, sample size, and 
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AQL/RQL. In other words, the program will allow the user to modify the pay equation 


and other parameters so that the plan may pay a bonus for superior quality, exactly 1.00 at 
the AQL and reduced pay from there, but tt does not report what the risk is to the 
contractor or agency at the AQL or RQL. 

OCPLOT can provide an agency the opportunity to avoid unwittingly permitting 
situations to develop where an acceptance plan pays too much on average at the AQL, or 
just as importantly, where it is unduly harsh on contractors. The power of OCPLOT lies 
in its ease of use, its analytical power, and its ability to bring substance to a more esoteric 


part of applied statistics. 


Regression a Potential Tool for Developing Pay Curves 

If an agency has determined that a pay scheme using tables instead of a pay 
formula has worked well in the past, it may be desirable not to change to using a formula 
despite its inherent advantages. An alternative would be to use regression to curve fit the 
pay data for each sample size. This way the mechanics of the pay table would be 
preserved, and the advantages of using a pay formula would be added. Spreadsheet 
programs such as Excel have built-in data analysis tools that make this an easy task and 
can be made to display the formula and R value. The only disadvantage to this technique 
is that it is likely regression will provide a formula that would not allow analysis using 
OCPLOT. If it is desired to use OCPLOT for analysis, the plan designer has no choice 
but to use OCPLOT’s linear or nonlinear formats. The process is simply to use the 


percent defective, or percent within limits at the AQL and RQL with their corresponding 
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pay factors, and solve for the variables using two equations and two unknowns. This 


technique is demonstrated in Appendix B. 





Chapter 6 
CONCLUSIONS AND RECOMMENDATIONS 
The original goal of this report was to determine risks assigned to the contractors 
and WSDOT in their asphalt QA specifications, and examine the pay factors used by 
WSDOT in its sampling plan. By doing so the study provided insight into statistically 


based sampling plans including pitfalls that may be encountered.. 


Determination of Risk for WSDOT's 1994 Specification 

After determining the statistical model which best fits WSDOT’s specifications, 
and comparing the differences in sampling to MIL-STD-414, it was concluded that it is 
not possible to accurately determine WSDOT's risk using MIL-STD-414's OC Curves 
and associated data distributions. By the specification, the contractor's risk was 
presumably fixed at 5% for all TRIS characteristics. MIL-STD-414 includes a battery of 
OC Curves which can be used to determine exactly what the risks would be for different 
sample sizes and levels of inspection, but WSDOT’s specification departs enough from 
this standard that the OC Curves no longer apply. As was found late in the study, this was 
by design. The primary reasons for the differences are because WSDOT uses different 
sample size categories than MIL-STD-414 for determining quality level (which in reality is 
probably a minor difference), and especially because the sample size is determined 
differently than MIL-STD-414. The importance of sample size is emphasized in Figure 4- 
5. Although WSDOT’s Standard Specification does not mirror the language of the FP-85, 


the fact that it uses the same quality and pay tables indicates the plan design is the same. 
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That FP-85 only indicates a seller’s risk, also contributes to the uncertainty of what the 


buyer’s or agency’s risk would be for varying sample sizes. As was discussed earlier, to 
properly design an OC Curve, two points, usually the AQL and RQL, are needed. 

At first it appears the risks could be determined directly from the hypergeometric 
formula, but this is not the case. Although it is true that the way a lot is defined by 
WSDOT accurately fits the hypergeometric model, the data collected 1s continuous, not 
discrete. That effectively eliminates the opportunity to use the hypergeometric formula to 
“solve for risk”. Again, it is worth mentioning that the material used for testing one 
sample, in proportion to the lot size, 1s in all practicality like sampling from an infinite 
universe, with the results closely approximating a normal distribution. 

For a single specification limit plan, it might appear to be possible to “work 
backwards” using the pay tables, and the nomograph. But this is not possible because the 
pay table does not indicate at a pay factor of 1.00 what the true acceptable quality level 15. 
This is because the pay tables are based on the non-central t-distribution which 
compensates for small sample sizes in determining lot quality. If it was assumed that the 
AQL could be read directly from the pay tables, there would be a different AQL for each 
sample size which is in direct conflict with the statement made in the FP-85. Virtually all 
the quality characteristics examined in WSDOT’s specification are double specification 
limit items which would require the use of the M method. The nomograph uses fraction 
defective, and probability of acceptance at the AQL and RQL to determine n and k. The k 
method essentially only works for single specification limit plans. As pointed out earlier, 


the M method can be used for either single specification or double specification limit plans 
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since it uses area under the curve not to exceed, rather than minimum distance between 


sample mean and specification limit. Figure 2-7 in Chapter 2 indicates that the area under 
the curve beyond the upper and lower specification limits can shift but still sum to the 
same quantity. This means that there is an infinite number of combinations for material 
that may be out of specification outside the upper and lower limits which results in a band 
of OC Curves [Duncan, 1986, pp 282, 283]. Consequently it would be impossible to 
work backwards from the information contained in WA-RD 326.1 to determine risk. 
There is another formula for probability of acceptance for a variables sampling plan 
[Duncan, Equations 12.1 and 12.2, pp 276-279] but again this is for the k method only 
and requires that n and k be known. n can be read directly from the pay factor table, but 
finding k 1s still not possible. 

The NONCENTT computer program remains the best alternative for determining 
the risks in WSDOT”s Sa It 1s based on the non-central t distribution which 
compensates for errors in determining lot quality based on small sample sizes. It provides 
a quantitative means for determining how much more out of specification material can be 
allowed relative to the sample size, while still ensuring that the lot is of some minimum 
quality. This 1s clearly seen in WSDOT's pay factor table where the allowable percent 
defective is larger for small samples, and decreases as the sample size gets larger for each 
pay factor. As the number of samples approaches infinity, the allowable out of 
specification material will approach the AQL for a pay factor of 1.00. Because 


NONCENTT was not discovered until late in the research, it was not used. 
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How Nomographs May be Used 


The nomographs in Chapter 3, Figure 3-1 and Figure 3-2, are designed as a short 
cut for a plan designer. Beginning with the fractions defective desired at the AQL and 
RQL for predetermined levels of risk, or probabilities of acceptance, the nomographs will 
yield a sampling plan with a given sample size n and acceptance number c, or sample size n 
and minimum k respectively. Only if appropriate information 1s known such as fractions 
defective, sample size and k, can the nomograph be used in reverse to find risk, and then 
only as long as the plan adhered to accepted methodology such as in MIL-STD-414. 


Otherwise it may yield two lines that do not intersect inside the nomograph. 


WSDOT Pay Factors 
Using OCPLOT to analyze the pay factors found in the WSDOT specification 
indicated that on average Washington is paying more than it should for AQL work. For 
sample sizes smaller than 20, the pay factor averaged about 104% for AQL work, and 
103% to 101% for samples up to and over 200. This was assuming that AQL work 
allowed 5% defective for all quality characteristics, and that the shape of the pay curve 


was dictated by the range of quality levels shown in the pay factor table. 


Development Standards 

There are many alternatives available for statistical sampling that a plan designer 
may use. The best approach to understanding statistical sampling is to study it from more 
than one source such as Duncan, Montgomery, and Blank. It is useful to see more than 


one author’s perspective because certain aspects which are unclear in one reference. ma 
, May 
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be explained more easily in another. MIL-STD-414 and MIL-STD-105 provide a 


variables and attributes sampling approach, respectively. Special circumstances may 
dictate that using one or both of these standards may be inappropriate or expensive. If the 
designer follows the guidelines in those textbooks named above, it will still be possible to 
create a plan with specified OC Curves. 

In situations where large sampie sizes are prohibitively expensive, NONCENTT is 
worthy of consideration as a tool for creating a viable sampling plan. WSDOT's pay 
factor tables are based on the non-central t distribution which was not examined in this 


study. 


Use Pay Formula vs Tables 

Pay tables are easier to apply than a pay formula, but inaccuracies can occur. The 
easiest way to avoid such inaccuracies Is to use a pay formula. A step by step worked out 
example in the specification demonstrating how to use it should avoid any confusion in 
usage. It should maintain the bonus provision, and be developed using OCPLOT to 
ensure it pays 1.00 at the AQL and 0.75 at the ROL. It may not be possible to achieve 
these pay factors exactly, but “close” is adequate as long as the pay curve behaves as the 


plan designer wishes. 


Explanation of Sampling Plans 
It is human nature to be hesitant about agreeing to terms that are not clearly 
understood. Statistical sampling plans can be mysterious without an explanation of key 


concepts such as those presented in this paper. Performance specifications which are 
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successful in clearly describing the scope of work and site conditions invariably avoid 


more claims than one which is less clear or less detailed. It stands to reason then, that a 
specification which intends to use statistical methods for acceptance and payment would 
benefit from the contractors having a full and clear understanding of the underlying 
statistical methods and risks. An alternative would be to make available an explanation of 
statistical sampling and its related concepts in a separate publication which could be 


referenced in the standard specification. 


Recommendations Specific to WSDOT 

Since discovering that the pay tables in WSDOT’s specification were developed using 
NONCENTT, it would be worth studying this program to accurately determine, and more 
fully describe WSDOT’s specification risks. Other future work WSDOT may wish to 
consider is the point of sampling of the asphalt concrete. Though not examined in this 


study, obtaining samples from the paver hopper versus the truck bed should be considered. 
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APPENDIX A: TERMINOLOGY 


Acceptance Number: The maximum number of defective items allowed before a lot is 
rejected. Generally, this term implies a relationship to attributes sampling plans. 

Acceptance Plan: A statistically based sampling plan. 

0) Risk: Also Alpha Risk, Seller's Risk, or Type I error. This is the chance that an owner 
will reject material that should be accepted. 

AQL: Acceptable Quality Level. This is the minimum level of quality that a plan designer 
wishes to permit, and still pay the contractor 100% of his bid price. This level of 
quality is an admission that for non-critical characteristics, 1t is unreasonable to 
expect zero defects. It is the level of defects that will not have an appreciable 
impact on the performance, or life of the material. 

B Risk: Also Beta Risk, Buyer’s Risk, or Type II error. This is the chance that an owner 
will accept material that should be rejected. 

Double Sampling: A procedure where a second sample may be required before a lot can 
be sentenced. If a sample percent defective is greater than that which would allow 
unquestioned acceptance, but lower than that which would require outright 
rejection, a second sample would be taken. 

Fraction Defective: Usually expressed as a percentage, this term is typically used to 
describe how defective the population is from which the lots and samples are 
drawn. 

MIL-STD: Military Standard 

OC Curve: Operating Characteristic Curve. This curve is a graphical representation of 
the discriminating power of a sampling plan. It shows the probability of 
acceptance of material exhibiting a spectrum of quality from no defects to very 
defective from which the lots and samples are drawn. 

Percent Defective: A term used to describe the quantity defective in a sample. 

QA: Quality Assurance. This term implies performance, or outcome based specifications. 
Many times QA also implies that statistical methods will be employed in contract 


enforcement. 
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ROL: Rejectable Quality Level. This is the threshold level of quality below which 


material will be reyected or replaced. Between the AQL and RQL, the contractor 
will still be paid, but at a progressively lower factor down to the RQL. 
Sentencing: A judgment based on sample results what the disposition of the lot should be. 
Single Sampling: One sample of one to thousands of items. In single sampling, a lot 1s 


sentenced based on the results of that one sample. 





APPENDIX B: PAY FORMULA DERIVATIONS 


Sample Size n=3 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


EE 1:00 FDO 2 


PF=0.75 PD=0.67 

A=1.05 

Then using the nonlinear equation from OCPLOT: PF = A- B(PDy 
100 = 105 – B(32) 75 = 105 — В(67)” 

hi9 - 5 В(67) = 30 

LogB + Ce Log32 = 0.699 LogB + C e Log67 = 1.477 


0.699 — C e Log32 = 1.477 — C e Log67 
C e Log67 — C e Log32 = 0.778 ` 
C(Log67 — Log32) = 0.778 


C = 24244 
В = 0.0011 


^. PF z 105 — 0.0011(PD) 


2.4244 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 





Sample Size n=4 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


PF=1.00 PD-0.26 : 
PF-0.75 PD-0.62 

A-1.05 

Then using the nonlinear equation from OCPLOT: PF = A- B(PD) 


75-2105- B(62) 
B(62) -30 
LogB - C Log62 = 1477 


100 2105 - 5(26) 
B(26) =5 
LogB + C e Log26 = 0.699 


0.609 — Ce Log26 21477 — C e Log62 
C e Log62 — C e Log26 = 0.778 
C(Log62 — Log26) = 0.778, 


ШЕ- 270618 
В = 0.0060 


-. PF 2105 — 00060(PD 


а 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 
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Sample Size n=5 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 апа 0.75 


РЕ=1.00 PD=0.22 ` 
РЕ=0.75 PD=0.59 
A=1.05 
Then using the nonlinear equation from OCPLOT: PF=A-B(PD) 
100 = 105 – 5(22) 75=105- B(59) 
Aa)" = > B(59) = 30 
LogB + C e Log22 = 0.699 LogB + Ce Log59 = 1.477 


0.699 — C e Log22 = 1477 — C e Log59 
C e Log59 — C e Log22 = 0.778 
C(Log59 - Log22) = 0.778 _ 


ee 18163 
ВБ - 0.0182 


. PF =1,05-0.0182(PD) ^ 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 
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Sample Size n=6 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


BE | 00 PD=0.20 


PE 0.75 PD 056 

A=1.05 

Then using the ncnlinear equation from OCPLOT: PF = A- B(PD) 
100 = 105 — B(20)° 75 = 105 — B(56)“ 
ЕП) =5 В(56) = 30 
LogB + Ce Log20 = 0.699 LogB + C e Log56 = 1477 


0.699 — C e Log20 = 1477 — C e Log56 
C e Log56 — C e Log20 = 0.778 
C(Log56 — Log20) = 0.778 


C = 1.7402 
B = 0.0272 


-. PF 2105— 00272(PD 


ү; 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 
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Sample Size n=7 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 
PF=1.00 PD=0.19 


PE=0.75 PD=0.54 


A=1.05 

Then using the nonlinear equation from OCPLOT: РЕ - А- B(PD) 
100 = 105 - 5(19) 75 = 105- В(54) 
B(19) =5 B(54) = 30 
LogB + Ce Logl9 = 0.699 LogB + Ce Log54 = 1.477 


0.699 — Ce Log19 2 1477 — C ١٠ 4 
C e Log54 – Се [0919 = 0.778 
C(Log54 — Log19) = 0.778 


0 - 3 
B = 0.0320 


г. РЕ = 105 — 0.0320(PD) 


1.7153 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 





Sample Size n=8 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


PF=1.00 PD=0.18 
РЕ=0.75 PD=0.53 
A=1.05 
Then using the nonlinear equation from OCPLOT: PF = A- B(PD) 
100 = 105 — B(18)° 75 =105 – В(53)° 
Bs) - 5 B(53)° = 30 
LogB+Ce Log18 = 0.699 LogB + Ce Log53 = 1.477 


0.699- Ce Logl8 21477 — C e Log 53 
Ce Log53 — Ce Log18 — 0.778 
C(Log53 - Log18) = 0.778 , 


185592 
В = 00413 


7. РЕ - 105 - 0.0413(PD 


p" 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 
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Sample Size n=9 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


71-10 pio 


I0. 7/5 PDO] 

OS 

Then using the nonlinear equation from OCPLOT: РЕ - А- B(PD) 
100 = 105 – 5(17) 75 «105 - B(51)* 
B(17) =5 B(51) = 30 
Logb + C e Log1]7 = 0.699 LogB + Ce Log51 = 1477 


0.699 — C e Log17 = 1477 — C e Log51 
C Log51— C e Logl7 = 0.778 
C(Log51- Log17) 20.778 , 


С = 1.6309 
B = 0.0492 


7. РЕ = 105 – 0.0492(PD 


M 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 


81 
Sample Size n=10 to 11 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


RE=1:00 PD=0.16 


PF=0.75 PD=0.50 

A=1.05 

Then using the nonlinear equation from OCPLOT: PF = Á- p(pp)° 
100=105- B(16)' 75 -105- B(50) 
В(16) = 5 B(50) -30 
LogB + Ce Log16 = 0.699 LogB 4- C e Log50 = 1477 


0.609 - Ce Logl6 21477 - Ce Log50 
C e Log50-— C e Log16 = 0.778 
C(Log50 - Log16)=0.778 | 


8L 15725 
В- 0.0639 


г. PF 2105 — 0.0689(PD 


p 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 
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Sample Size n=12 іо 14 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


0-0 PD=0.15 


2: 0.75 PD=0.49 

ASTOS 

Then using the nonlinear equation from OCPLOT: РЕ = А- B(PD) 
100 2105 — B(15) 75 = 105— B(49)° 
B(15) =5 B(49) = 30 
LogB + C e Log15 = 0.699 LogB + C e Log49 = 1.477 


0.699 — C e Log15 2 1477 — C e Log49 
C e Log49 — C e Logl5 2 0.778 
C(Log49 — Logl 5) = OS 


a 


CC 15136 
В = 00830 


- PF 2105- 00830(PD) “° 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 
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Sample Size n=15 to 18 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


PF=1.00 PD=0.14 
PF=0.75 PD=0.47 
A=1.05 
Then using the nonlinear equation from OCPLOT: PF = А- 8) PD) 
100 2 105 — B(14) 75 - 105 - 8)47(“ 
B(14) - 5 B(47) = 30 
LogB + Ce Logl4 = 0.699 LogB + C e Log47 = 1.477 


0.699 — C « Logl4 — 1477 — C « Log47 
Ce Log47— Ce Logl4 = 0.778 
C(Log47 — Log14) = 0.778 


а 


(С 14795 
Б = 01008 


7. РЕ = 1.05 - 01008(PD) 


1.4795 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 
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Sample Size n=19 to 25 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


PE= 100 PD=0.13 


РЕ-0. 75 PD=0.45 

A=1.05 

Then using the nonlinear equation from OCPLOT: РЕ = А- B(PD) 
100 = 105 - B(13)* 75 = 105 - B(45)" 
B(13) = 5 B(45) = 30 
LogB + Ce Logi3 = 0.699 LogB + Ce Log45 =1477 


0.699 — C e Log13 = 1477 — C e Log45 
Ce Log45 — Ce Log13 = 0.778 
C(Log45 — Log13) 2 0.778 , 


C = 14430 
ib = 01235 


г. РЕ -1.05-01235(Рр) 


1.4430 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 


LO AAA 





Sample Size n=26 to 37 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


FE 1.00 PD 011 


PF=0.75 PD=0.43 

A=1.05 

Then using the nonlinear equation from OCPLOT: РЕ - А- B(PD) 
100 2105 - B(11) 75 «105- B(43) 
B(11) =5 B(a3) «30 
Mueble Logll= 0.699 LogB + C e Log43 = 1477 


0.699— Ce Log11 = 1477 — C e Log43 
Ce Log43—- C e Logll- 0.778 
C(Log43 - Logl1) = 0.778 , 


C=13143 
1002139 


РИ = 105 – 02139(Рр 


I 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 
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Sample Size n=38 to 69 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


РЕ=].00 PD=0.10 

PF=0.75 PD=0.41 

A=1.05 

Then using the nonlinear equation from OCPLOT: PF = А- B(PD) 
100 = 105 — B(10) 75 «105 - B(41) 
B(10)" =5 B(41)" =30 
LogB +C e Log10 = 0.699 Logb +C e Log41 = 1477 


0.699 — C e Log10 = 1477 — C e Log4l 
Ce Log41—Ce Log10 = 0.778 
C(Log41- Log10) = 0.778 


4 


NL 17699 
Б - 0.2089 


-. PF 2 105 — 02689(PD 


) 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 
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Sample Size n=70 to 200 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


БЕ-! 00 120—009 


11-05 PD=0.38 

A=1.05 

Then using the nonlinear equation trom OCPLOT: РЕ - А- B(PD) 
100 = 105 – B(9)* 75=105- B(38)' 
B(9) =5 B(38) = 30 
LogB + Ce Log9 = 0.699 LogB + Ce Log38 = 1477 


0.699 — C e Log9 = 1477 — C e Log38 
Ce Log38 — Ce Log9 = 0.778 
C(Log38 – 1099) = 0.778 


4 


C = 12440 
903250 


. PF =1.05-03250(PD) 


1.2440 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 
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Sample Size n-20] to o 


Percents Defective (100-Quality Level) from Table 4-1 at a Pay Factor of 1.00 and 0.75 


BI 00 PD 007 

26075 RD 0535 

A=1.05 

Then using the nonlinear equation from OCPLOT: Pp B(PDy 


С 


100 = 105 - B(7) 75 = 105 - В(35): 
Bo} 5 B(35) = 30 


LogB + Ce Log7 = 0.699 LogB + C e Log35 = 1.477 


0.699 — C e Log7 = 1477 — Ce Log35 
C e Log35— C e Log? = 0.778 
C(Log35 - Log7) = 0.778 


(= 1133 
В = 0.5730 


- PF -105-05730(PD) ^ 


Note: (1) Decimals are carried through all calculations and are only rounded at the end. 


(2) Pay Factors, Percents Defective, and A are multiplied by 100 in the equations. 
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